Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

192 ◾ Bioinformatics

Figure 5.17 shows the boxplot of the TMM-normalized data for each sample. The black line

dividing each box represents the median of the count data, the top of the box shows the

upper quartile, and the bottom of the box shows the lower quartile. The top and bottom

whiskers show the highest and lowest count values, respectively, and the circles indicate

the outliers.

We can use multidimensional scaling (MDS) [35] for representing relationships graphi-

cally between samples in multidimensional space and showing the overall differences

between the gene expression profiles for the different samples. The MDS uses the pairwise

dissimilarity Euclidean distances between samples in terms of the leading log-fold change

(logFC) for the genes that best characterize the pair of samples. The leading logFC is cal-

culated as the root-mean-square of the top log2-fold changes between the pair of samples

(the default is 500 (top=500) logFCs). Edger uses “plotMDS” function to plot the MDS plot.

The general syntax is as follows:

plotMDS(x, top=500, gene.selection = “pairwise”, method = “logFC”,

...)

The samples are then represented graphically in two dimensions such that the distance

between points on the plot approximates their multivariate dissimilarity. The objects that

are closer together on the MDS plot are more similar than the distant ones. In our example

data, if there is a difference between the normal and tumor samples, then we can see clear

patterns in multivariate dataset.

install.packages(“RColorBrewer”)

library(RColorBrewer)

png(file=”MDSPlot.png”)

pseudoCounts <- log2(yNorm$counts + 1)

colConditions <- brewer.pal(3, “Set2”)

colConditions <- colConditions[match(sampleinfo$condition,

levels(factor(sampleinfo$condition)))]

patients <- c(8, 15, 16)[match(sampleinfo$patient,

levels(factor(sampleinfo$patient)))]

plotMDS(pseudoCounts, pch = patients, col = colConditions, xlim =

c(-2,2))

legend(“topright”, lwd = 2, col = brewer.pal(3, “Set2”)[1:2],

legend = levels(factor(sampleinfo$condition)))

legend(“bottomright”, pch = c(8, 15, 16),

legend = levels(factor(sampleinfo$patient)))

dev.off()

As shown in Figure 5.18, the patterns are very clear that the samples are clustered by the

condition; normal samples are grouped together, and tumor samples are grouped together,

which indicates that the difference between the two groups is much larger than the dif-

ferences within groups and it is likely that the between-group difference is statistically